{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Iris数据集有四个维度（特征），对其做降维操作后可以更容易地做数据可视化"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "降维操作的目标就是找出是否有一个合适的低维数据，能够保留原有数据的重要特征。数据降维通常用于辅助数据的可视化操作，对两个维度的数据做可视化操作比对四个维度甚至更多维度的数据做可视化操作更容易。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这里使用了 Principal component analysis (PCA) 算法，这是一个快速的线性降维技术。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# 导入Iris数据集。Scikit-Learn自带了这个经典的数据集\n",
    "from sklearn import datasets\n",
    "iris = datasets.load_iris()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 导入PCA类\n",
    "from sklearn.decomposition import PCA\n",
    "\n",
    "# 实例化模型，设置超参数n_components等于2,表示生成两维的结果，即两个特征\n",
    "model = PCA(n_components=2)\n",
    "\n",
    "# 训练模型。这里没有对数据做训练集/测试集的切分，也没有用到目标阵列y\n",
    "model.fit(iris.data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# 用训练好的模型对数据做降维操作，从原有的4个维度生成一个2个维度的数据\n",
    "X2d = model.transform(iris.data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 用seaborn.lmplot来展示结果\n",
    "import seaborn as sns\n",
    "df = DataFrame({'pca1': X2d[:,0], 'pca2': X2d[:,1], 'species': iris.target})\n",
    "sns.lmplot('pca1', 'pca2', hue='species', data=df, fit_reg=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 类似的matplotlib命令如下：\n",
    "import matplotlib.pyplot as plt\n",
    "df = DataFrame({'A': X2d[:,0], 'B': X2d[:,1], 'C': iris.target})\n",
    "C = df.C.map({0: 'b', 1: 'orange', 2: 'g'})\n",
    "plt.scatter(df.A, df.B, c=C)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 虽然不借助种类标签，PCA算法降维所生成的二维数据打出的图形中，不同的种类之间分离得很好，这意味着一个分类算法很可能可以很好地应用于这份数据上。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
